-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hoist collapse shape out of scf.forall when possible and expand its destination #19044
Hoist collapse shape out of scf.forall when possible and expand its destination #19044
Conversation
d58c313
to
e44793b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far looking good! I left some comments about some potential edge cases, and some more tests that would be good to have for the first round of comments.
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/test/propagate_reshapes_by_expansion.mlir
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/test/propagate_reshapes_by_expansion.mlir
Outdated
Show resolved
Hide resolved
9523904
to
48d5691
Compare
48d5691
to
02a2945
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The code looks a lot cleaner now too, nice work!
Hmm, looks like punet fails to compile with this. I think it must be one of the conv dispatches (because nothing else uses this pass), but it seems a bit strange to me that this is causing a compilation failure. |
Yeah thats a bummer, one thing I can try is to see if CI passes when I disable multi-result, if thats the case then the issue might be that the getTiedResult function assumes the multiple parallel_insert_slice are ordered in the same way as the results. |
I took a look at the model, and found the dispatch that is failing. Here's a gist with a dump: https://gist.github.com/Max191/a092ec5c81e5f9f44aee19358f007a34 Looks like the problem is that the collapse_shape was hoisted out of the forall, but then could not fold with the flow.dispatch.tensor.store at the function boundary, because the store has an offset (from pad fusion). I'm not quite sure what the best solution for this is at the moment. EDIT: snippet from the dump after reshape propagation: } -> (tensor<1x2x8x16x4x16xf16>, tensor<1x2x8x16x4x16xi8>)
scf.forall.in_parallel {
tensor.parallel_insert_slice %27#0 into %arg3[%arg0, %arg1, 0, 0, %arg2, 0] [1, 2, 8, 16, 4, 16] [1, 1, 1, 1, 1, 1] : tensor<1x2x8x16x4x16xf16> into tensor<2x128x8x16x20x16xf16>
tensor.parallel_insert_slice %27#1 into %arg4[%arg0, %arg1, 0, 0, %arg2, 0] [1, 2, 8, 16, 4, 16] [1, 1, 1, 1, 1, 1] : tensor<1x2x8x16x4x16xi8> into tensor<2x128x8x16x20x16xi8>
}
} {mapping = [#iree_codegen.workgroup_mapping<z>, #iree_codegen.workgroup_mapping<y>, #iree_codegen.workgroup_mapping<x>]}
%collapsed = tensor.collapse_shape %20#1 [[0], [1], [2, 3], [4, 5]] : tensor<2x128x8x16x20x16xi8> into tensor<2x128x128x320xi8>
flow.dispatch.tensor.store %collapsed, %7, offsets = [0, 1, 1, 0], sizes = [2, 128, 128, 320], strides = [1, 1, 1, 1] : tensor<2x128x128x320xi8> -> !flow.dispatch.tensor<readwrite:tensor<2x130x130x320xi8>>
flow.dispatch.tensor.store %20#0, %8, offsets = [0, 0, 0, 0, 0, 0], sizes = [2, 128, 8, 16, 20, 16], strides = [1, 1, 1, 1, 1, 1] : tensor<2x128x8x16x20x16xf16> -> !flow.dispatch.tensor<writeonly:tensor<2x128x8x16x20x16xf16>>
return
} |
One way to fix this would be to check that the consumer of the scf.forall is a flow.dispatch.tensor.store with a full slice (i.e., zero offsets, unit strides, and source_sizes == target_sizes). This is making the pattern even more specific, but it is already quite specific, so maybe it is okay for now. |
Makes sense, I guess theoretically it should be possible to support this?
|
Following up here from a discussion with Nirvedh offline. We are going with the solution I proposed here: #19044 (comment) As long as the convolution has a consumer in the dispatch, then the pipeline should be able to handle not hoisting the collapse_shape when the dispatch.tensor.store has offsets (when a pad consumer has been fused). If there is no consumer, and the result of the convolution is directly stored with some offsets, then this will cause problems, but I don't think it is a case that is likely to be seen. There is typically an operation between convolutions and pads, and pad fusion does not happen by default anyway. It is a case to be aware of, but I think it is fine to proceed with this additional restriction for now. In other words, the following case would be problematic if the pad is fused with the producer convolution: %conv = linalg.conv_2d_nhwc_hwcf ...
%pad = tensor.pad %conv ... However, this seems like an uncommon case, and pad fusion only happens with aggressive fusion anyway. |
fdf1b03
to
ddcaeda
Compare
900a1d3
to
d11b745
Compare
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh <[email protected]>
Signed-off-by: Nirvedh <[email protected]>
d11b745
to
8a3727c
Compare
…or all users (#19139) The existing pattern added in #19044 created different offsets for each user even though we previously checked that the offsets will be exactly same. This was preventing recursive application of the pattern as the comparison of the offsets for the next application of patten would fail. The change in this PR is tested by removing cse in test file which was added by #19044 to workaround this exact issue. Signed-off-by: Nirvedh Meshram <[email protected]>
…estination (iree-org#19044) This pattern allows to hoist out collapse shape producer of tensor.parallel_insert_slice and expand the enclosing scf.forall destination. This is only safe because we are doing this on workgroup mapped sc.forall's and hence the slices are disjoint. The reason to have this pattern is that it allows us to eliminate empty tensors during bufferization which would otherwise be blocked due to the collapse. --------- Signed-off-by: Nirvedh <[email protected]>
…or all users (iree-org#19139) The existing pattern added in iree-org#19044 created different offsets for each user even though we previously checked that the offsets will be exactly same. This was preventing recursive application of the pattern as the comparison of the offsets for the next application of patten would fail. The change in this PR is tested by removing cse in test file which was added by iree-org#19044 to workaround this exact issue. Signed-off-by: Nirvedh Meshram <[email protected]>
…estination (iree-org#19044) This pattern allows to hoist out collapse shape producer of tensor.parallel_insert_slice and expand the enclosing scf.forall destination. This is only safe because we are doing this on workgroup mapped sc.forall's and hence the slices are disjoint. The reason to have this pattern is that it allows us to eliminate empty tensors during bufferization which would otherwise be blocked due to the collapse. --------- Signed-off-by: Nirvedh <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>
…or all users (iree-org#19139) The existing pattern added in iree-org#19044 created different offsets for each user even though we previously checked that the offsets will be exactly same. This was preventing recursive application of the pattern as the comparison of the offsets for the next application of patten would fail. The change in this PR is tested by removing cse in test file which was added by iree-org#19044 to workaround this exact issue. Signed-off-by: Nirvedh Meshram <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>
This pattern allows to hoist out collapse shape producer of tensor.parallel_insert_slice and expand the enclosing scf.forall destination. This is only safe because we are doing this on workgroup mapped sc.forall's and hence the slices are disjoint. The reason to have this pattern is that it allows us to eliminate empty tensors during bufferization which would otherwise be blocked due to the collapse.